---
title: "Cleaning religious polarization and fractionalization indexes"
---

Source:
http://svmiller.com/peacesciencer/reference/creg.html

# Load

```{r}
# load packages
  source("helper-packages.R")

# install svmiller's package if needed
  # install.packages("peacesciencer")
  library(peacesciencer)

# load the data 
  rel_frac_pol_raw <- creg
```

# Clean

"The state codes provided by the CREG project are mostly Correlates of War codes, but with some differences. Summarizing these differences: the state code for Serbia from 1992 to 2013 is actually the Gleditsch-Ward code (340). Russia after the dissolution of the Soviet Union (1991-onward) is 393 and not 365. The Soviet Union has the 365 code. Yugoslavia has the 345 code. The code for Yemen (678) is effectively the Gleditsch-Ward code because it spans the entire post-World War II temporal domain. Likewise, the code for post-unification Germany is the Gleditsch-Ward code (260) as well. The codebook actually says it's 265 (which would be East Germany's code), but this is assuredly a typo based on the data."

"The fractionalization estimates are the familiar Herfindahl-Hirschman concentration index. The polarization formula comes by way of Montalvo and Reynal-Querol (2000), though this book does not appear to be published beyond its placement online. I recommend Montalvo and Reynal-Querol (2005) instead. You can cite Alesina (2003) for the fractionalization measure if you'd like."

```{r}
# clean
  rel_frac_pol_clean <- 
    rel_frac_pol_raw %>% 
    mutate(
      county_common = countryname(countrycode(creg_ccode, origin = 'cown', destination = 'country.name')),
      county_common = 
        case_when(
          creg_ccode == 260 ~ "Germany", # west germany
          creg_ccode == 393 ~ "Russia", 
          creg_ccode == 365 ~ "Soviet Union",
          creg_ccode == 340 ~ "Serbia",
          TRUE ~ county_common)) %>%
    filter(year == 2000) %>%
    select(county_common, contains("frac"), contains("pol")) %>%
    mutate(
        ethfrac_givenyear_above_median = (ethfrac > median(ethfrac, na.rm = T))*1,
        relfrac_givenyear_above_median = (relfrac > median(relfrac, na.rm = T))*1,
        ethpol_givenyear_above_median = (ethpol > median(ethpol, na.rm = T))*1,
        relpol_givenyear_above_median = (relpol > median(relpol, na.rm = T))*1) %>% 
    rename_with(~paste("relfp", .x, sep = "_"))
```

# Save data

```{r}
  saveRDS(rel_frac_pol_clean, "../cleaned-data/x-8-rel-fractionalization-polarization.rds")
```